Retinal metric: a stimulus distance measure derived from population neural responses 
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The ability of the organism to distinguish between various stimuli is limited by the structure and 
noise in the population code of its sensory neurons. Here we infer a distance measure on the stimulus 
space directly from the recorded activity of 100 neurons in the salamander retina. In contrast to 
previously used measures of stimulus similarity, this "neural metric" tells us how distinguishable a 
pair of stimulus clips is to the retina, given the noise in the neural population response. We show 
that the retinal distance strongly deviates from Euclidean, or any static metric, yet has a simple 
structure: we identify the stimulus features that the neural population is jointly sensitive to, and 
show the SVM-like kernel function relating the stimulus and neural response spaces. We show that 
the non-Euclidean nature of the retinal distance has important consequences for neural decoding. 



Neural populations convey information about their 
stimuli by their joint spiking patterns [1 j. At the level 
of single cells, the mapping from stimuli to spikes has of- 
ten been captured by linear- nonlinear (LN) models [2j [3] . 
Geometrically, a single LN neuron can be viewed as a 
perceptron |4 , partitioning the stimulus space into two 
domains — one of stimuli that evoke spikes and one of 
stimuli that do not — by a decision boundary, or a hyper- 
plane, determined by the linear feature of the LN model 
[SH7]. The brain, listening for spikes coming from such 
a neuron, will thus interpret stimuli as similar insofar as 
they evoke similar spiking patterns. But how does an 
interacting population, as a whole, partition the stimu- 
lus space? Conversely, which stimuli are interpreted as 
different, or similar, by an interacting population? 

Answering these questions is fundamental to our un- 
derstanding of the neural code and depends critically 
on finding the correct "metric" for sensory stimuli in 
terms of the information that neural populations carry. 
Since neurons are noisy, repeated presentations of the 
same stimulus can result in different neural responses, so 
the stimulus/response mapping of the population needs 
to be described by the probability distribution, P(a\s) 
[8]. Two stimuli s\ and S2 may be far apart as mea- 
sured by a chosen distance function, e.g. Euclidean norm 
^2(^1,52) = — 82 1 1, yet they could evoke responses 
drawn from almost overlapping distributions P(a\si) and 
P(cr\s2), making it nearly impossible for the brain, listen- 
ing to the spikes a arriving from the sensory system, to 
tell those stimuli apart. Conversely, the sensory circuit 
could be sensitive to specific stimulus changes that have a 
small Euclidean norm, emphasizing those particular dif- 
ferences as an important feature and encoding it in the 
neural response. We therefore suggest that the biologi- 
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cally relevant distance between pairs of stimuli should be 
derived from the distance between the response distribu- 
tions evoked by these stimuli [9j [10] . 

To characterize the structure of neural distance in a 
large population, we recorded extracellularly the activity 
of N = 100 retinal ganglion cells (RGCs) in the tiger sala- 
mander using a multi-electrode array [TTJ [12] . The retina 
patch was presented 626 times with a 10 s long segment 
of spatially uniform flicker with Gaussian distributed lu- 
minance drawn independently at 30 Hz (Fig.[l^L,b). Time 
was discretized into T = 961 bins of 10 ms and the joint 
response of the neurons a = {a a } was represented in 
each bin by a TV-bit codeword, where a a = 1 (0) denoted 
that the neuron a = 1, . . . , N spiked (did not spike) in 
that bin. Since direct sampling of the conditional dis- 
tribution P(<j\s) is impractical for such a large popula- 
tion (even with hundreds of repeats, Fig.jTJ^), we inferred 
a stimulus-dependent maximum entropy (SDME) model 
for this data that predicts P(cr\s) for each time bin, as 
we report in detail in Ref. [T3] . 

Since only differences in retinal responses can guide the 
organism's behavior, the biologically relevant distance 
between stimuli si and 82 must be a measure of similar- 
ity between their corresponding response distributions. 
We define the retinal distance between the stimuli as the 
symmetrized Kullback-Leibler distance between the dis- 
tributions of responses they elicit, 

D re t(si,s 2 ) = D%™(P(a\ Sl ),P(a\s 2 )) (1) 

where the symmetrized KL divergence is D s ^(p,q) = 
0.5 (£ x (p(x) \og 2 p{x)/q{x) + q(x) log 2 q(x)/p(x)) [14]. 
We choose this principled information-theoretic measure 
because it quantifies the difference between stimuli 
precisely to the extent that their response distributions 
are distinguishable p~4j [15]. Once constructed, the 
analysis of D ret should help us uncover the fundamental 
aspects of the stimulus space, in particular, whether the 
distance between pairs of stimuli is determined by a 
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FIG. 1: a) Stimulus segment with two 400 ms stimulus clips s± (red), S2 (blue), b) Rasters for 100 RGCs shown for 3 example 
repeats; response vectors to si, S2 shown between dashed lines, c) Measured response rasters to two highlighted stimulus clips 
(si, left; S2, right; spikes = white dots; colored bars = neuron firing rates), d) For every pair of timebins in the experiment, 
the Euclidean distance D2 between the corresponding stimulus clips is shown in the upper diagonal part of the matrix, and the 
retinal distance D re t in the lower part, e) Four pairs of stimulus clips (green and violet) are selected to demonstrate that D2 
(increasing top to bottom) and D re t (increasing left to right) are not monotonically related. 



small number of features or stimulus projections. 

Using the SDME model for P(cr\s) [30 j, we computed 
the retinal distance, D ret , between every pair of stimulus 
clips in the experiment. Figure [l]i shows the profound 
difference between the Euclidean and retinal distance for 
all the pairs in a two- second interval of the stimulus. In 
particular, the same value of Euclidean distance can be 
obtained from very similar neural responses, or very dif- 
ferent ones, as shown in Fig.[]J for selected pairs of stim- 
ulus clips. 

To further explore the structure of retinal distance, 
we used multidimensional scaling (MDS) to project the 
distance matrix D Tet (si,Sj), between all pairs = 
1, . . . , T) of presented stimulus clips, into a low dimen- 
sional space [16] . This embedding technique does not di- 
rectly reveal the structure of the stimulus space, but can 
approximate its effective dimensionality. Every stimulus 
clip Si is assigned a i\"-dimensional point V{ in Euclidean 
space, such that 

D r et(s i: Sj) « f(\\Vi -Vj\\), (2) 

with /(•) being a monotonic function. It is pos- 
sible to find such accurate mappings for small K: 
Fig. [2^i shows the strong correspondence between low- 
dimensional MDS projections and the retinal distance, 
for different K values. Fig. [2}d summarizes the MDS 
performance at low orders in terms of the mutual infor- 
mation it captures about D Tet ] higher information val- 
ues correspond to a tighter relationship with less scatter. 
Figure shows the structure of stimulus space using 
MDS with K = 2, which already captures most of the 
structure of the stimulus space. The first coordinate of 
the MDS projection, is strongly correlated with the 
average firing rate in the population: high values corre- 



spond to "off-like" stimuli, and small values correspond 
to fiat or "on-like" stimuli that do not drive the neural 
population well. The increased sensitivity to "off-like" 
features is consistent with the known prevalence of OFF- 
type cells in the salamander retina and in our dataset. 
Although "on-like" stimuli differ substantially in their 
shapes and thus in their Euclidean distances D2, they 
are largely indistinguishable for the retina. In contrast, 
groups of stimuli sharing the same coordinate (yellow 




FIG. 2: MDS assigns to each stimulus clip Si a K- 
dimensional vector Vi = {v^\ . . . , v\ K ^}. a) Relationship 
between D re t (one dot = one pair of stimuli) and the Eu- 
clidean norm in MDS space, d(i,j) = \ \vi — Vj\\, gets tighter 
with K. b) Information / between D ret and MDS distance d 
as a function of embedding dimension K. c) For K — 2, all 
T = 961 stimulus clips are shown as points with MDS coordi- 
nates (v^, t/ 2 ^); shade of red corresponds to mean population 
firing rate (scale at right). Five groups of stimuli are denoted 
by circles (gray lines = individual stimulus clips, color line = 
average). Equal distances in the plane correspond to equal 

-Dret- 
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FIG. 3: a) Sample clips from the stimulus space (5 highlighted in color), b) A model for D re t maps each stimulus clip s to 
a point (ir 1} y 2 >) in the 2d MDS space. Predictions for t/ 1 ' 2 ) are obtained by filtering the stimulus clip s with a linear filter 
ki,2 and passing the result through a pointwise nonlinearity (for v^ 2 \ the transformation is linear with the slope that increases 
with ki • s, as shown), c) All T = 961 stimulus clips represented as points in a plane with coordinates predicted by the model 
in b (colored squares = highlighted stimuli), d) Given predictions for coordinates Vi = (y^\v^), the D ret for all (~ 4.6 • 10 5 ) 
pairs of clips (si, Sj) can be predicted using a simple fitted relation in the inset (black line = mean binned model predictions vs 
true Dret, shaded area = std of binned predictions). Red line shows D re t computed from true 2d MDS coordinates of Fig. [2J;. 
Two highlighted distances from c denoted by arrows, e) The fraction of information about the true D re t captured by the .Dret 
model in b, by the best-fit global metric g^y, and by the Euclidean distance, normalized by the success of the K — 2 MDS. 



and green) have a similar shape and evoke a similar pop- 
ulation firing rate, yet are distinguishable because they 
differ along the second coordinate v^ 2 \ To the retina, 
yellow and green groups of stimuli are as distinct from 
each other as the blue and magenta groups at constant 
but different 

Figure [3] presents a model that predicts the mapping 
of an arbitrary stimulus s into and thus allows 

us to compute D ret . This model, obtained by using sim- 
ple reverse correlation analysis [3 , relies on two coupled 
linear-nonlinear transformations (Fig. [3^i-d), and iden- 
tifies two dominant population-level stimulus features 
fci,fc2- stimuli are distinguishable only insofar as their 
projections onto fci, ki differ. The model accurately pre- 
dicts D ret (Fig. [3^), establishing the relation of Eq. ([2| 
to be d(ij) = \\vi - Vj\\ « exp{a" 1 D ret (8 i , sj)} 
(Fig. [3]l). Interestingly, this relation, which we did not 
assume a priori, is exactly the kernel function used in 
several very successful applications of support vector ma- 
chine (SVM) classification in machine learning, where one 
needs to distinguish between "classes" (here, stimulus 
clips) based on the distributions over "features" (here, 
neural responses) that they induce [17 j. Our findings in- 
dicate that the neural population, much like single neu- 
rons, performs low-order dimensionality reduction on the 
incoming stimuli; however, unlike single neurons that can 
signal only a binary decision in every time bin, the pop- 
ulation has access to ~ 2^ states which can encode the 
variation along the relevant directions with greater pre- 
cision. This view is consistent with the reported highly 
redundant code in the retina [15] . 

Given the failure of the Euclidean metric to pre- 
dict stimulus similarity, and the success of the low- 
dimensional model, we asked whether a general quadratic 



form could explain the retinal distance. We thus looked 
for an optimal matrix g^ vi such that D ret (si,Sj) ~ 

^2fiu( s i^ ~ s Y ) ^9^v(s ( f^ -s^), wnere /i, v range over all 
40 components of stimulus clip vectors s. Using cross- 
validated least-squares fitting, we found that the opti- 
mal g^ v substantially outperformed the Euclidean met- 
ric, yet still only captured ~ 20% of the structure in 
D ret (Fig. [3^). The best-fitting matrix g^ v has a simple 
structure that is captured by two eigenvectors, match- 
ing the pair of population- level stimulus features, fci, &2, 
independently inferred in Fig. [3J3- Despite this, the best- 
fitting static g^ v performs poorly: the eigenvalues corre- 
sponding to k\ , &2 would have to depend on the stimulus 
in order to approximate well our D ret model. 

Our results carry important implications for stimulus 
decoding. The accuracy of our model for stimulus sim- 
ilarity enables us to create new stimuli that are simi- 
lar to each other up to any desired distance. We used 
Monte Carlo simulation to generate ensembles of full- 
length stimuli, such that each clip from the generated 
stimulus is less than O distant (as measured by D ret ) 
from the corresponding clip in the original stimulus dis- 
played in the experiment (Fig. |4|l). Our analysis pre- 
dicts that for small enough O, all stimuli from such an 
ensemble are essentially indistinguishable to the retina. 
In contrast to Euclidean distance, which would allow the 
generated stimuli to fluctuate around the original wave- 
form equally at every point in time for a given thresh- 
old 9, the retinal distance constrains the possible set of 
stimuli much more at certain times than at others, re- 
flecting a preference of the retina for encoding specific 
features in the stimulus. This is in part due to com- 
pressive nonlinearities, which squeeze large segments of 
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FIG. 4: a) Two Monte-Carlo generated ensembles of long 
stimuli (ensemble mean = red; std = shade) where each clip 
of the generated stimulus is less than G = 1, 10 distant from 
the corresponding clip in the original stimulus (blue line). 
Below: std over the ensembles (G = 1, 10), showing how en- 
sembles get more tightly constrained at particular instants 
corresponding to high neural activity (raster, bottom) and 
steep stimulus changes, b) The rise in coherence between the 
original stimulus and the simulated ensembles with decreasing 
G. 

stimulus space with small feature overlap into a small vol- 
ume as measured by retinal distance (Fig. [2^) , while em- 
phasizing stimulus differences with high feature overlap. 
Consequently, distance models (such as Euclidean dis- 
tance) with constant metrics are incapable of capturing 
the characteristics of the retinal distance. We conjecture 
that decoding approaches based on minimizing standard 
distance measures (e.g. [19]) may overly penalize devi- 
ations from aspects of the stimulus that are simply not 
represented in the neural responses, while not emphasiz- 
ing strongly enough the deviations from population-level 
stimulus features identified here. 

Understanding the neural code crucially depends on 
our ability to go beyond single-neuron spatio-temporal 
receptive fields, and identify how many, and which, are 
the population-level features in an interacting popula- 
tion. We introduced a novel, biologically relevant dis- 
tance measure on the space of stimuli based on the ac- 
tivity of large populations of neurons. This approach 
extended the single-neuron notions of stimulus feature 
extraction to the neural population, determining how a 
high-dimensional input space is partitioned and encoded 
by population responses. Our work thus suggests a prin- 
cipled alternative to arbitrary norms (like the Euclidean 
norm) for stimulus similarity and decoding, generalizes 
previous attempts to construct metrics for particular in- 
put spaces from neural responses (e.g. [21, 22 J, and com- 
plements existing work on the dual problem of construct- 
ing relevant spike-train distance measures [23j [24] . 

The approach we presented here will be instrumen- 
tal in the analysis of upcoming experiments, which allow 
the recording of large parts of sensory neural circuits, or 
even of all the cells encoding some parts of the sensory 
scene [25]. This approach can be immediately applied 
to other sensory modalities, where it could signal — much 
as we have found here — that the "neural metric" devi- 
ates considerably from our intuitive notions of similarity. 



Moreover, it can be extended to sensory domains where 
we lack any obvious notion of similarity, e.g. olfaction, 
for which there exists no natural distance between chem- 
ical stimuli [20]. More broadly, as the neural metric is 
based on the spiking activity itself, this framework can 
be taken beyond sensory modalities, to study perceptual 
metrics as well (e.g. [26]) or used to define neural-based 
distances for motor behavior that would be critical for 
neural prosthesis applications [27H29] , 
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for P(a\s), but the low-dimensional qualitative structure 
of the stimulus space identified in this paper can be ro- 
bustly reproduced with non-SDME models; see Supple- 
mentary Information. 



